\documentclass[conference]{IEEEtran}
\usepackage{fullpage,graphicx,psfrag,url,biblatex}
\usepackage[english]{babel}% Recommended
\usepackage{csquotes}% Recommended
\usepackage{subfigure}
\bibliography{references.bib}
\renewcommand*{\bibfont}{\scriptsize}

\title{Planning Under Uncertainty for Mapping}
\author{Team 7}
	%\addtolength{\oddsidemargin}{-0.35in}
	%\addtolength{\evensidemargin}{-0.35in}
	%\addtolength{\textwidth}{0.7in}

	%\addtolength{\topmargin}{-.875in}
	%\addtolength{\textheight}{1.75in}
	%\addtolength{\textheight}{1.75in}
	%\addtolength{\intextsep}{-0.1in}
	%\addtolength{\dblfloatsep}{-0.5in}
	%\addtolength{\abovecaptionskip}{-0.1in}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{document}

\fontsize{9.1}{11}\selectfont
\maketitle

%\section*{Multi-agent path planning algorithm for future Air Traffic Management}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{The active slam problem}
A central problem in deliberative robotics is mapping an unknown world. Search and rescue robots enter an unknown disaster area and must generate a map as they explore. Without an accurate map of the world, a robot cannot effectively reason about future actions. Simultaneous Localization and Mapping (SLAM) is a successful technique for mapping an unknown environment with relatively dense, unique features. Usually, acquiring sensor data over the whole map requires human intervention to cover unexplored area. This process does not scale as intelligent robots become ubiquitous. Beyond the fact that offline algorithms cannot determine a strategy to completely cover the map, the computed path is clearly not optimal. Recently, some online algorithms attempted to actively choose the next observation point depending on the data already recorded to optimize coverage and accuracy.\\
\begin{figure}[htbp]
	\centering
		\includegraphics[scale=0.21]{image/MDP3EndLarge.png}
	\caption{Example robot exploration using SPLAM}
	\label{fig:example}
\end{figure}\\
Under real-world time constraints, however, a robot must choose between coverage and accuracy of an unknown area in order to maximize information gain while dealing with imperfect and incomplete information. To be effective in online mapping, a robot must be able to evaluate the possible information gained by closing-the-loop or moving to a new observation point, as shown on Fig. 1. We will address the time-constrained issue, the choice of an uncertainty measure, and the exploration-exploitation trade-off in Active SLAM.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Related work}
Active SLAM is a particular issue of Active Sensing~\cite{survey}, which address the problem of how to optimally control a sensor for information gathering. Active SLAM computes an exploration policy -- a strategy -- for a robot equipped with a SLAM algorithm in an unknown environment. The aim is to obtain a policy which trades off exploration (maximizing map coverage) and exploitation (going back to known locations to increase the accuracy of the map). The first approaches developed to solve this hard task were greedy algorithms~\cite{greedy} that use the current knowledge to maximize the information gained in the next measurement. It should be noted that planning with a longer horizon could improve results.\\

The most general way to describe the policy is a sequence of freely chosen actions. The problem of finding a sequence of actions to minimize cost for systems that fully observe the system's state is called a Markov Decision Process (MDP). A system where the measurement is noisy or does not fully observe the system state is a Partially Observable Markov Decision Process (POMDP). Clearly, Active SLAM is a continuous-state and continuous-action POMDP. In that formulation, solving a POMDP is intractable computationally. Nevertheless, authors succeed to solve it with approximations or discretization. In~\cite{POMDPbayes}, authors use Bayesian optimization of a Gaussian process. In~\cite{POMDPreinforcement}, model-free reinforcement learning is applied. While these approaches seem to produce high quality paths, their computation is expensive.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Approach}
\subsection{Implementation Overview}
Instead of trying to solve the POMDP, we implemented our solution using Model Predictive Control (MPC). In~\cite{RaoMPC,MPCAOptimal,MPC}, authors use Model Predictive Control optimization with multiple step look-ahead in order to decide the next action to maximize the information gain. To plan the next actions, the robot simulates possible moves and predicts sensing. To simulate its moves, the robot chooses between a set of discretized actions. To predict the sensing, the robot uses its estimated map of the known features and a noisy sensor model. Thus, the robot builds a tree of possible paths from its current position and predicts the sensing along these trajectories. These observations are used to update the state of a Kalman Filter (KF). The first action that produced the trajectory which leads to the maximum information gain is then selected to be executed. After that move, the state of the KF is updated according the current sensing and the MPC search is run again to find the next action.\\

The trade-off between exploration (visiting unexplored area) and exploitation (reducing map uncertainty) is made by switching the current goal of a state machine guiding the move generation of the robot. A goal is represented by a virtual feature with a big uncertainty. That causes the robot to visit that feature. During exploration, the current goal is set to a point in an unexplored area of the map. In exploitation, the current goal is set to an already visited location to enhance the quality (reduce uncertainty) of a landmark or the robot state. Below we describe the implementation of our main contributions.

\subsection{Framework}
We used the Mobile Robot Programming Toolkit (MRPT) as our simulation framework~\cite{blanco2008derivation}. It is an open-source toolkit designed to provide simulation tools for rapid robot algorithm development in C++. Modules include mapping, sensing, robot dynamics modeling, and a SLAM framework. Also included is a 3D visualization tool to display the robot in an environment based on OpenGL. Using MRPT, we implemented a basic robot model that is controlled with rotational and linear velocities. The simulation uses a PID controller to reach the commanded velocities. Each millisecond the pose of the robot is updated. We also inject Gaussian noise into odometry readings to simulate imprecise ego-motion sensing. The robot sensor provides distance and bearing measurements for all landmarks in its field of view with injected Gaussian noise to simulate imprecise sensor readings. Finally, SLAM is performed by an Extended Kalman Filter; we used the implementation of the MRPT::SLAM package. Also, we implemented a time-step simulation to update the robot position within the world for discrete time steps during simulation. Moving commands and sensing commands are sent every time-step. 

\subsection{Switching Policy}

\begin{figure}[h]
 \centering
 \includegraphics[scale=.2,keepaspectratio=true]{./image/explore.jpg}
 % explore.jpg: 1092x368 pixel, 72dpi, 38.52x12.98 cm, bb=0 0 1092 368
 \caption{Exploration States}
 \label{fig:switchingPolicy}
\end{figure}

We implemented a 3-state state machine to control the movement policy of the robot, shown in figure~\ref{fig:switchingPolicy}. The state transitions are defined in terms of the robot state uncertainty, the landmark uncertainties, and the history of information gained from previous actions. The default goal is to \textit{Explore} which will choose the predicted best point outside the known area to explore. The choice of point is described below. If the robot state uncertainty increases beyond a threshold, the state changes to \textit{Improve Localization} and the robot returns to a well-known point in its map to reduce state uncertainty. If the overall map uncertainty is above a threshold, the state changes to \textit{Improve Map} and the robot revisits known landmarks with high uncertainty. Our thresholds for switching states is predetermined by the user and reflects the desire to focus on exploration or exploitation. Thus, we will avoid infinite loops and map inconsistency.

\subsection{Uncertainty Measure}
The uncertainty measure is related to the SLAM algorithm used. The classical approach, which we use, relies on Extended Kalman Filters (EKFs)~\cite{MPC}. The uncertainty is contained in the covariance matrix of the EKF. The global goal is to minimize the covariance matrix. Possible norms include the trace, the determinant, and the maximum of Eigen values. The trace has been shown to most effectively measure the uncertainty for SLAM~\cite{MPC, MPCAOptimal}. So, we focus on the trace.

\subsection{Map Expansion}
We implemented the frontier-based goal selection~\cite{RaoMPC} using the nearest attractive point~\cite{MPC} to determine the expected best point to move toward during the exploration state. This works by first evaluating the predicted information gain for regions just beyond the boundary of the known map. Using the maximum predicted gain area, we place an artificial attractor in the unknown region to bias the robot's movement toward this goal. These attractors only effect the robot movement in the explore state.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Evaluation}
We compare multiple MPC algorithms running our SPLAM implementation on two maps using the MRPT simulator. The small simulated map contains $30$ landmarks randomly distributed throughout $400 \ m^2$ and the second contains $70$ landmarks randomly distributed throughout $900 \ m^2$. The robot is allowed to wander away from the bounding box around the landmarks, but the attractive forces of visible landmarks always pull the robot back into the landmark region. All the simulation parameters are detailed below. We tested the SPLAM using greedy, 2-step MPC, and 3-step MPC policies.

\begin{figure}[ht]
\centering
\subfigure[Start of greedy]{
\includegraphics[scale=.1]{./image/greedyStartSmall.png}
\label{fig:subfig1}
}
\subfigure[End of greedy]{
\includegraphics[scale=.1]{./image/greedyEndSmall.png}
\label{fig:subfig2}
}
\subfigure[Start of 3-step MDP]{
\includegraphics[scale=.1]{./image/MDP3StartSmall.png}
\label{fig:subfig3}
}
\subfigure[End of 3-step MDP]{
\includegraphics[scale=.1]{./image/MDP3EndSmall.png}
\label{fig:subfig4}
}
\label{fig:subfigureExample}
\caption[Optional caption for list of figures]{Results of SPLAM on small map}
\end{figure}

\begin{figure}[ht]
\centering
\subfigure[Small Map]{
\includegraphics[scale=.6]{./image/smallMap.png}
\label{fig:subfig5}
}
\subfigure[Large Map]{
\includegraphics[scale=.55]{./image/bigMap.png}
\label{fig:subfig6}
}
\label{fig:data}
\caption[Optional caption for list of figures]{Map trace (solid lines) and map coverage (dashed) over the course of the simulation}
\end{figure}


The performance metrics are the map coverage after a maximum of $500$ time steps if $100\%$ coverage is not reached prior, and the reamining uncertainty in the map as defined by the covariance matrix of the EKF. In our simulations, one time step is $500$ ms. Visualization results are shown in \figurename{ 3} for the small map and \figurename{ 1} for the large map. The estimated path is lighter, while the actual path is darker. Each landmark is marked with a $+$ sign and the diameter of the ring around it indicates the uncertainty of that landmark. Lines eminating from the robot frame indicate landmarks that are within range of the robot's sensors. The graphs in \figurename{ 4} show the map uncertainty along with the coverage percent over the entire 500-step simulation on our two test maps for each implemented algorithm. The evaluation method presented is the trace, which is a common norm choice used to evaluate the uncertainty.

\begin{table}[htbp]
	\centering
		\begin{tabular}{|r|l|}\hline
\multicolumn{2}{|c|}{\textbf{Dynamic constraints}} \\ \hline
Number of discretized actions & $9$ \\ \hline
Max speed & $2$ m/s \\ \hline
Max turn rate & $1$ rad/s \\ \hline
Min speed & $0$ m/s \\ \hline
Min turn rate & $-1$ rad/s \\ \hline
\multicolumn{2}{|c|}{\textbf{Odometry Error per Step}} \\ \hline
Noise std (XY) & $0.05$ m \\ \hline
Noise std ($\phi$) & $0.05$ degrees \\ \hline
\multicolumn{2}{|c|}{\textbf{Range and bearing sensor}} \\ \hline
Min sensor distance & $0.5$ m \\ \hline
Max sensor distance & $7$ m \\ \hline
Field of view & $360$ degrees \\ \hline
Noise std (range) & $1$ m \\ \hline
Noise std (Yaw) & $10$ degrees \\ \hline
\multicolumn{2}{|c|}{\textbf{Robot state thresholds}} \\ \hline
Max trace for Exploration & $0.75$ m$^2$  \\ \hline
Max trace for Improve Map & $0.75$ m$^2$ \\ \hline 
		\end{tabular}
	\caption{Simulation parameters}
	\label{tab:SimulationParameters}
\end{table}

\section{Discussion}
We proposed a new way to solve the Active SLAM problem using a Model Predictive Controller to approximate the intractable exact solution of the associated POMDP. In comparison to the other previous approaches that used a Greedy algorithm, our solution is able to efficiently plan with a larger horizon with respect to computation. Moreover, the state machine is a simple and effective way to deal with the trade-off between exploration and exploitation. On the Figure 3.b, the greedy algorithm is not able to efficiently cover the entire map because it frequently loops on itself due to the lack of planning horizon. In comparison, the 3-step MDP on Figure 3.d, successfully fulfills the accuracy and coverage requirements. These results are also shown in Figure 4.a. In contrast with the small map, performance differences were much more pronounced on the big map. Indeed, on a big map the greedy algorithm may never explore distant regions because of its inability to plan some steps ahead. In general, the greedy algorithm has a tendency to make the robot loop in local clusters of landmarks. On Figure 4, the MPC with a 3-step look ahead outperforms the other methods for all metrics; it reaches $100$\% coverage faster and the trace converges to its minimum faster. This shows that our method outperforms previous methods.\\

For future work, we would like to replace the breath-first search in the tree of trajectories with an RRT-search to reduce the computation of the complexity. Moreover RRT will allow us to select from a continuous set of actions.

\printbibliography
\end{document}
